10 research outputs found

    Anonymizing large transaction data using MapReduce

    Get PDF
    Publishing transaction data is important to applications such as marketing research and biomedical studies. Privacy is a concern when publishing such data since they often contain person-specific sensitive information. To address this problem, different data anonymization methods have been proposed. These methods have focused on protecting the associated individuals from different types of privacy leaks as well as preserving utility of the original data. But all these methods are sequential and are designed to process data on a single machine, hence not scalable to large datasets. Recently, MapReduce has emerged as a highly scalable platform for data-intensive applications. In this work, we consider how MapReduce may be used to provide scalability in large transaction data anonymization. More specifically, we consider how setbased generalization methods such as RBAT (Rule-Based Anonymization of Transaction data) may be parallelized using MapReduce. Set-based generalization methods have some desirable features for transaction anonymization, but their highly iterative nature makes parallelization challenging. RBAT is a good representative of such methods. We propose a method for transaction data partitioning and representation. We also present two MapReduce-based parallelizations of RBAT. Our methods ensure scalability when the number of transaction records and domain of items are large. Our preliminary results show that a direct parallelization of RBAT by partitioning data alone can result in significant overhead, which can offset the gains from parallel processing. We propose MR-RBAT that generalizes our direct parallel method and allows to control parallelization overhead. Our experimental results show that MR-RBAT can scale linearly to large datasets and to the available resources while retaining good data utility

    Genome-wide association identifies nine common variants associated with fasting proinsulin levels and provides new insights into the pathophysiology of type 2 diabetes.

    Get PDF
    OBJECTIVE: Proinsulin is a precursor of mature insulin and C-peptide. Higher circulating proinsulin levels are associated with impaired β-cell function, raised glucose levels, insulin resistance, and type 2 diabetes (T2D). Studies of the insulin processing pathway could provide new insights about T2D pathophysiology. RESEARCH DESIGN AND METHODS: We have conducted a meta-analysis of genome-wide association tests of ∼2.5 million genotyped or imputed single nucleotide polymorphisms (SNPs) and fasting proinsulin levels in 10,701 nondiabetic adults of European ancestry, with follow-up of 23 loci in up to 16,378 individuals, using additive genetic models adjusted for age, sex, fasting insulin, and study-specific covariates. RESULTS: Nine SNPs at eight loci were associated with proinsulin levels (P < 5 × 10(-8)). Two loci (LARP6 and SGSM2) have not been previously related to metabolic traits, one (MADD) has been associated with fasting glucose, one (PCSK1) has been implicated in obesity, and four (TCF7L2, SLC30A8, VPS13C/C2CD4A/B, and ARAP1, formerly CENTD2) increase T2D risk. The proinsulin-raising allele of ARAP1 was associated with a lower fasting glucose (P = 1.7 × 10(-4)), improved β-cell function (P = 1.1 × 10(-5)), and lower risk of T2D (odds ratio 0.88; P = 7.8 × 10(-6)). Notably, PCSK1 encodes the protein prohormone convertase 1/3, the first enzyme in the insulin processing pathway. A genotype score composed of the nine proinsulin-raising alleles was not associated with coronary disease in two large case-control datasets. CONCLUSIONS: We have identified nine genetic variants associated with fasting proinsulin. Our findings illuminate the biology underlying glucose homeostasis and T2D development in humans and argue against a direct role of proinsulin in coronary artery disease pathogenesis

    MR-RBAT: Anonymizing Large Transaction Datasets Using MapReduce

    No full text
    Part 1: Data Anonymization and ComputationInternational audiencePrivacy is a concern when publishing transaction data for applications such as marketing research and biomedical studies. While methods for anonymizing transaction data exist, they are designed to run on a single machine, hence not scalable to large datasets. Recently, MapReduce has emerged as a highly scalable platform for data-intensive applications. In the paper, we consider how MapReduce may be used to provide scalability in transaction anonymization. More specifically, we consider how RBAT may be parallelized using MapReduce. RBAT is a sequential method that has some desirable features for transaction anonymization, but its highly iterative nature makes its parallelization challenging. A direct implementation of RBAT on MapReduce using data partitioning alone can result in significant overhead, which can offset the gains from parallel processing. We propose MR-RBAT that employs two parameters to control parallelization overhead. Our experimental results show that MR-RBAT can scale linearly to large datasets and can retain good data utility

    A parallel method for scalable anonymization of transaction data

    No full text
    Transaction data, such as market basket or diagnostic data, contain sensitive information about individuals. Such data are often disseminated widely to support analytic studies. This raises privacy concerns, as the confidentiality of individuals must be protected. Economization is an established methodology to protect transaction data, which can be applied using different algorithms. RBAT is an algorithm for anonymzitng transaction data that has many desirable features. These include flexible specification of privacy requirements and the ability to preserve data utility well. However, like most economization methods, RBAT is a sequential algorithm that is not scalable to large datasets. This limits the applicability of RBAT in practice. To address this issue, in this paper, we develop a parallel version of RBAT using MapReduce. We partition the data across cluster of computing nodes and implement the key operations of RBAT in parallel. Our experimental results show that scalable economization of large transaction datasets can be achieved using MapReduce and our method can scale nearly linear to the number of processing nodes

    Performance of concrete walls with waste and recycling materials for industrial building systems

    No full text
    Concrete walls with lighter weight significantly reduce the dead loads. In this regard, the central question was to find, production of reduce dead load concrete; for this an experimental test were done on four sample scales of walls. The samples were based on size scale (640x220x30mm) which is 1/5 of the real wall size used in Industrial Building Systems (IBS). The samples were: (a) Normal IBS wall (control sample), (b) Bottom ash IBS wall (used 50% of the amount of sand), (c) Crushed brick IBS wall (used 100% of the amount of sand), and (d) No-fines aggregate concrete IBS wall (without sand). For comparison, the samples were tested on 28th day. The density of type (a), type (b), type (c) and type (d) were 2355, 1974, 2038.2 and 1800 kg/mᵌ, respectively. In respect of the compressive strength, type (a) (control sample) was the strongest type with 31N/mm² and type (d) was the weakest type with 8MPa. The other two, type (b) and type (c) with 25 and 28MPa, have been determined as their compressive strength, respectively. For the elastic modulus test; 22GPa, 17GPa, 22GPa, 6GPa were recorded for type (a), type (b), type (c) and type (d), respectively. For the flexural test on the walls, it has been clearly seen that type (a), (b) and (c) had almost the adequate value of 17.7MPa, 13.3MPa and 15.8MPa, sequentially while type (d) achieved the lowest value among the four walls with 8.1MPa. Since type (a) is considered as a control sample; thereby, type (b) and type (c) unlike type (d) are appropriate to be used in IBS wall constructions due to their passable engineering properties (density, compressive strength, E-value and bend rapture)

    Pediatric patients with bicytopenia/pancytopenia: Review of etiologies and clinico-hematological profile at a tertiary center

    No full text
    Background: The etiology of bicytopenia/pancytopenia varies widely in children, ranging from transient marrow viral suppression to marrow infiltration by fatal malignancy. Depending on the etiology, the clinical presentation can be with fever, pallor or infection. Knowing the exact etiology is important for specific treatment and prognostication. Aims: To evaluate the etiological and clinico-hematological profile in children with bicytopenia and pancytopenia. Materials and Methods: A review of bicytopenic and pancytopenic children referred for bone marrow examination from January 2007 to December 2008 was done. Detailed history, clinical examination and hematological parameters at presentation were recorded. Results and Conclusion: During the study period, a total of 990 children were referred for bone marrow examination for different indications. Of these, 571 (57.7%) had either pancytopenia (17.7%) or bicytopenia (40%). Commonest form of bicytopenia was anemia and thrombocytopenia seen in 77.5% cases, followed by anemia and leukopenia in 17.3% and leukopenia and thrombocytopenia in 5.5% cases. Most common etiology was acute leukemia (66.9%) in bicytopenic children and aplastic anemia (33.8%) in pancytopenic children. Children with bicytopenia had a higher incidence of underlying malignancy (69.5% vs. 26.6%), splenomegaly (60.5% vs. 37.4%), lymphadenopathy (41.8% vs. 15.1%) and circulating blasts (64.6% vs. 20.1%) and a lower incidence of bleeding manifestations (12.1% vs. 26.6%) as compared to children with pancytopenia
    corecore